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Abstract 

Background: Combination of different agents is widely used in clinic to combat complex diseases with improved 
therapy and reduced side effects. However, the identification of effective drug combinations remains a challenging 
task due to the huge number of possible combinations among candidate drugs that makes it impractical to screen 
putative combinations. 

Results: In this work, we construct a 'drug cocktail network' using all the known effective drug combinations 
extracted from the Drug Combination Database (DCDB), and propose a network-based approach to investigate 
drug combinations. Our results show that the agents in an effective combination tend to have more similar 
therapeutic effects and share more interaction partners. Based on our observations, we further develop a statistical 
approach termed as DCPred (Drug Combination Pred ictor) to predict possible drug combinations by exploiting the 
topological features of the drug cocktail network. Validating on the known drug combinations, DCPred achieves 
the overall AUC (Area Under the receiver operating characteristic Curve) score of 0.92, indicating the predictive 
power of our proposed approach. 

Conclusions: The drug cocktail network constructed in this work provides useful insights into the underlying rules 
of effective drug combinations and offer important clues to accelerate the future discovery of new drug 
combinations. 



Background 

Drug combination is the combination of different agents 
that can achieve better efficacy with less side effects 
compared to its single components. Recently, it is 
becoming a popular and promising strategy to new drug 
discovery, especially for treating complex diseases, e.g. 
cancer [1-3]. For example, Moduretic is the combination 
of Amiloride and Hydrochlorothiazide, which is an 
approved combination used to treat patients with hyper- 
tension [4,5] . Chan et al. identified a combination drug, 
namely Tri-Luma, for combating melasma (dark skin 
patches) of the face based on efficacy and safety experi- 
ments [6]. Agrawal et al. found two effective combina- 
torial drug regimens to treat Huntington disease based 
on prescreening in Drosophila [7]. In addition, through 
the synergistic antiangiogenic effects, very low-dose 
combinatorial use of vinblastine (VBL) and rapamycin 
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(RAP) was demonstrated to inhibit the proliferation of 
the endothelial cells much more effectively than single 
drug treatment both in vitro and in vivo [8]. Recently, 
Lehar et al. found that synergistic drug combinations 
may have less side effects, because synergistic drug com- 
binations are generally more selective to particular cellu- 
lar contexts than single agents, and the dosage of each 
compound in combination will be reduced compara- 
tively [9]. Despite of the extensive efforts that have been 
made to discover new drug combinations in the past 
few decades, the majority of effective combinatorial 
drugs used in clinic were discovered through experi- 
ences, which generally require labor-intensive and time- 
consuming "brute force" screening of all possible combi- 
nations among the approved individual drugs [10]. In a 
drug combination, a drug may promote or suppress the 
effect of another one. For instance, cyclosporine 
increases the effect of sirolimus, while bupropion 
decreases the effect of cyclosporine. As a result, two 
drugs may have a totally new effect that is different 
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from the ones of either individual drugs [11,12]. Accord- 
ingly, the presence of potential drug-drug interactions 
(DDIs) and the possibility of pharmacokinetic interven- 
tions between the drugs could confound the identifica- 
tion of effective drug combinations [13]. Furthermore, 
the number of possible combinations will increase expo- 
nentially with the increasing availability of single drugs. 
For example, in the case of four drugs, there will be six 
possible combinations. This number would be enormous 
considering the fact that there are thousands of 
approved drugs. Due to the huge search space of possi- 
ble combinations between known drugs, the identifica- 
tion of optimal and effective drug combinations is a 
non-trivial and challenging task. 

Therefore, it is necessary to develop effective in silico 
methods that are capable of discovering new drug com- 
binations prior to combination synthesis and practical 
test in the lab. Owing to the completion of human gen- 
ome sequencing projects and the advancement of mole- 
cular medicine, extensive system biology efforts have 
been made to discover new combinations based on 
molecular interaction networks [14,15] in the past few 
years [16-19]. Nevertheless, there is still a long way to 
go before we reach the stage of devising generally 
applicable and effective prediction models. Recently, 
there have been considerable progresses in developing 
new approaches for identifying drug-drug interactions 
and even drug combinations [13]. In this context, Geva- 
Zatorsky et al. have recently found that the protein 
dynamics in response to drug combination can be accu- 
rately described by a linear superposition of the 
dynamics under the corresponding individual drugs [16]. 
Their study indicated that protein dynamics of three- 
and four-drug combinations can be predicted based on 
the drug combination pairs, thereby providing a useful 
way for reducing the search space of possible drug com- 
binations. Calzolari et al. devised an efficient search 
algorithm originated from information theory for opti- 
mization of drug combinations based on the sequential 
decoding algorithms [17]. More recently, researchers 
have also developed computational frameworks for pre- 
dicting drug combinations and synergistic effects based 
on high-throughput data [18-20]. 

In this work, we study the drug combinations in terms 
of their therapeutic similarity and the network topology 
of a drug cocktail network constructed from the effec- 
tive drug combinations deposited in the Drug Combina- 
tion Database (DCDB) [21]. We find that the drugs in 
an effective combination tend to have more similar ther- 
apeutic effects and share more interaction partners in 
the context of drug cocktail network. We further 
develop a statistical approach called DCPred to predict 
possible drug combinations and validate this approach 
based on a benchmark dataset with all the known 



effective drug combinations. As a result, DCPred 
achieves the overall best AUC (Area Under the receiver 
operating characteristic Curve) score of 0.92, demon- 
strating the predictive capability of the proposed 
approach and its potential value in identifying new pos- 
sible drug combinations. 

Results and discussion 

The drug cocktail network 

In this study, we extracted 239 known effective pairwise 
drug combinations from DCDB [21]. The information of 
ATC code for each drug was obtained from DrugBank 
[22]. Based on these datasets, we constructed a drug 
cocktail network with 215 nodes and 239 edges (see Fig- 
ure 1 for the visualization of this network), where nodes 
represent the drugs and an edge is connected if two 
drugs are found in an effective drug combination. Build- 
ing up this network can thus give the readers a visual 
impression of the relationships between drugs that can 
form effective combinations. Moreover, the network the- 
ory can be utilized to explore possible combinatorial 
mechanisms between drugs. In Figure 1, the size of each 
node approximates its degree, and the width of each 
edge approximates the therapeutic similarity (TS) (as 
defined in Equation 3) between the two drugs linked by 
the edge, while the grey edges indicate that the two 
drugs linked by the edge have totally different therapeu- 
tical effects. In addition, we found 102 drugs that have 
at least two neighbors in the drug cocktail network, 
which we termed as "star drugs" hereafter and 91 of 
which have target protein annotations in DrugBank. 

Since most of biological networks are scale-free net- 
works [23], we analyzed the topology of the drug cocktail 
network in order to find out whether it is also a scale- 
free network. The degree distribution of the drug cocktail 
network is shown in Figure 2. It is evident that the degree 
distribution follows a power law distribution, suggesting 
that it is indeed a scale-free network. That is, the fraction 
P(x) of nodes in the drug cocktail network having x con- 
nections to other nodes can be described as: 

p{x) <x cx~" (1) 

where c = 2.1 and a = 1.9 in this case. 

As the drug cocktail network shown in Figure 1 is not 
fully connected, the top 6 largest subnetworks were cho- 
sen for further analysis. We considered the drug cocktail 
network as the union of these 6 subnetworks hereafter 
unless stated specifically. In particular, each subnetwork 
was found to be enriched for one or several therapeutic 
classes according to the ATC classification system, as 
shown in Table 1. In other words, the drugs having 
similar therapeutic effects tend to be clustered together 
in the drug cocktail network. 
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Figure 1 The drug cocktail networl<. A node represents a drug and an edge denotes an effective combination consisting of the two drugs 

inl<ed by the edge. The hub drugs that have more than 6 neighbors are colored in red. The size of each node approximates its degree, the 

width of each edge approximates the therapeutic similarity (see equation 3) between the two drugs linked by the edge, and a grey edge means 

that the two drugs linked by that edge have completely different therapeutic effects. The numbers in panel 1-6 represent the top six largest 

child networks from the drug cocktail network 
^ J 
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y = -1.9x + 2.1 
= 0.9408~ 




1.4 



Figure 2 The degree distribution of the drug cocl<tail networl<. The x-axis represents the common logarithm of the value of degree k, while 
the y-axis represents the common logarithm of the fraction of drugs that have the degree of k. 



To test our hypothesis that the drugs in one combina- 
tion tend to have similar therapeutic effects, the drug 
cocl<tail network was compared against random combi- 
nation networks. For this purpose, a therapeutic similar- 
ity (TS) score was calculated for each drug pair, and the 
average of all TS scores was used as the TS score for 
the whole drug cocktail network. The random combina- 
tion networks were generated by randomly shuffling the 
edges while still preserving the degree for each node 
[24] in the drug cocktail network. This procedure was 
repeated for 1,000 times. To examine the statistical sig- 
nificance of the difference between the drug cocktail 
network and random combination networks, one P- 
value was calculated as the ratio that the TSs of random 
combination networks are larger than that of the drug 
cocktail network during the 1000 randomizations. The 
results are shown in Table 2 at different ATC code 

Table 1 The enriched ATC codes for child networks 



Subnetwork 


Number of drugs 


Enriched ATC codes: Frequency 


1 


84 


L40, J:24 A:16, S:l 1 


2 


29 


C:28 


3 


17 


N:8, M:7 


4 


9 


J:9 


5 


7 


N:7 


5 


5 


J:5 



The enriched therapeutic effects represented by the ATC codes (first level} for 
the top six largest child networl<s, where the numbering for each child 
network is consistent with that shown in Figure 1. Here, the enriched ATC 
codes mean that they occur more frequently, either more than 10 times or 
accounting for more than 40% of all ATC codes assigned to the drugs in the 
child networks. 



levels ranging from 1 to 4. The calculated P-values of 
the drug cocktail network across ATC code levels 1-4 
are all equal to 0, strongly suggesting that the real drug 
combinations significantly differ from the random com- 
bination networks. Note that the 5* ATC code level 
was not considered here, as there is only one drug com- 
bination having identical ATC codes for all the five 
levels in the drug cocktail network. This means that the 
5* ATC code level is not suitable for performing statis- 
tical analysis and thus it is not included in the analysis. 

Furthermore, we studied the therapeutic effects for the 
"star drugs" and their neighbors in the drug cocktail 
network in order to reveal whether the star drugs have 
therapeutic similarities to all their neighbors. Figure 3 
shows the distribution of the TS scores for star drugs 
and their neighbors. For the effective combination pairs 
involving star drugs, 82% have therapeutic similarity, 
and most of the star drugs have similar therapeutic 
effects as the majority of their neighbors. In contrast, 
78% of the combination pairs in the random network do 
not have any therapeutic similarity. These results sug- 
gest that one star drug tends to be used in combination 



Table 2 The comparisons between drug cocktail network 
and random networks 



ATC code level 


1 


2 


3 


4 


P-value 


0/1 000 


0/1000 


0/1 000 


0/1 000 



The P-value at which the ratio of the therapeutic similarity (IS) score of a 
random network is larger than that of the drug cocktail network in the 
randomization tests of 1000 times at different ATC code levels. 
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Figure 3 The distribution of the TS scores between star drugs and their neighbors Blue and red lines represent the drug cocktail network 
and random network, respectively. 



with drugs that have similar therapeutic effects as the 
star drug. 

Moreover, we also investigated the distribution of 
neighbor drug pairs of star drugs (Figure 4A and 4B), 
attempting to answer whether or not the drug pairs that 
share a star drug have therapeutic similarity. To address 
this, we divided the neighbor drug pairs of a star drug 
into two groups, according to whether they have similar 
ATC codes, or whether they are approved effective com- 
binations. We then calculated the percentage of effective 
combinations among drug pairs that share a star drug 
and have a TS score equal to or larger than a certain 
threshold (Figure 4C). From Figure 4C, we can see that 
the more similar therapeutic effects (as reflected by the 
TS score) two drugs have, the more likely they are effec- 
tive combinations. Another important observation is 
that the combinations between drugs sharing similar 
therapeutic effects and star drugs are more likely effec- 
tive combinations. 

In various networks, the hub nodes are generally con- 
sidered to play important roles [25]. Therefore, we next 
studied the 14 hub drugs in the drug cocktail network, 
all of which have more than 6 neighbor drugs. The lar- 
gest two hub drugs are DB00999 (Hydrochlorothiazide) 
and DB00072 (Trastuzumab). Hydrochlorothiazide is 
used to treat high blood pressure and edema [26,27]. 
According to the annotations in DrugBank and DCDB, 
we found that all the 18 drug neighbors of hydrochlor- 
othiazide can be used to cure hypertension while all the 
drug combinations involving hydrochlorothiazide have 
been used to treat hypertension. Among these 18 



combinations, 11 combinatorial drugs target different 
but related pathways while the other 7 ones target unre- 
lated pathways (Additional file 1). In the case of Trastu- 
zumab used to treat HER2-positive metatsatic breast 
cancer [28,29], 5 of its 10 neighbor drugs are used to 
treat breast cancer, while the other 5 have pesticide 
effects on neoplasm or other cancers. All the 10 drug 
combinations are used to treat breast cancer except the 
one used for treating gastric cancer. Additionally, 8 drug 
combinations target related pathways, while the other 
two target different unrelated pathways or cross-talking 
pathways (Additional file 2). Finally, these results, 
together with the consistent findings shown in Figure 3, 
strongly indicate that star drugs tend to have similar 
therapeutic characteristics as their neighbors. 

In addition, we investigated the proteins targeted by 
the 13 hub drugs in the drug cocktail network that have 
target information. By mapping all proteins targeted by 
the drugs in the drug cocktail network to the human 
protein-protein interaction network retrieved from 
STRING database [30], we found that, in terms of the 
shortest distance between target proteins, hub drugs 
tend to have a closer relationship with their combina- 
tion partners than the drugs having similar ATC codes 
(see Figure 5A). Furthermore, we analyzed the cellular 
localizations of these target proteins of the 13 hub drugs 
(see Figure 5B). More than 70% of the target proteins of 
the hub drugs are membrane proteins, which is reason- 
able considering that membrane proteins are widely 
involved in various biological processes and represent 
the largest class of drug targets. 
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Figure 4 Star drugs and their neighbors. (A) The distribution of neighbor drug pairs of star drugs. The neighbor pairs of star drugs can be 
classified into two groups, according to whether they have similar ATC codes, or whether they are used as effective combinations. (B) Schematic 
view of the relationship between two neighbors d, and of a star drug. (C) The percentage of effective combinations within neighbor drug 
pairs with T5 equal to or larger than a certain threshold. Blue and red lines represent the drug cocktail network and the average of 1000 
randomly generated combination networks, respectively. 



Implication of drug cocktail network for possible drug 
combinations 

As shown in Figure 3, 82% of the combinations between 
star drugs and their neighbors have therapeutic similar- 
ity, and most of the star drugs have therapeutic similar- 
ity to the majority of their neighbors in the drug 
cocktail network. Additionally, most of the effective 
combinations are observed to be located in the vicinity 
of drug pairs with similar ATC codes. Hence, it is possi- 
ble to predict drug combinations from the set of drug 
pairs with similar ATC codes. Nonetheless, we found 
that there are only 74 known effective combinations in 
all of the 1181 possible combinations with similar ATC 
codes. Since the number of effective drug combinations 
is considerably smaller than that of random combina- 
tions between drugs having similar ATC codes, it is a 



challenging but crucial task to discover the effective 
combinations from the pool with a vast number of ran- 
dom combinations. 

In Figure 4B and 4C, we can see that if two drugs 
with similar ATC codes have a common neighbor in the 
drug cocktail network, they are more likely to be com- 
bined together. Therefore, we assume that the two 
drugs having similar ATC codes and sharing a signifi- 
cantly larger number of common partners in the drug 
cocktail network are more likely to be combined effec- 
tively. Based on this assumption, we further developed a 
new statistical approach called DCPred to test this 
hypothesis and applied it to predict and rank all the 
possible drug combinations (See Materials and methods 
for more details). In particular, three different versions 
of DCPred were considered in this work, including 
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Figure 5 Target proteins of hub drugs. (A) The blue line represents the shortest distances between the targets of hub drugs and the targets 
of their combination partners, while the red line represents the shortest distances between the targets of hub drugs and the targets of drugs 
that are therapeutically similar to hub drugs. (B) The distribution of cellular localizations of the target proteins of hub drugs. 



DCPredl considering TS only, DCPred2 considering TS 
and drugs with at least 2 neighbors, and DCPredS con- 
sidering TS and drugs with at least 3 neighbors. In the 
case of DCPred2 and DCPredS, all possible drug combi- 
nations were ranked in ascending order according to the 
jc-value by equation (4), and the top ones were consid- 
ered as putative effective drug combinations. While in 
the case of DCPredl, all possible drug combinations 
were ranked in descending order according to the TS 
value by equation (3), and the top ones were considered 
as putative effective drug combinations. The ranking list 



of drug combinations can be found in the additional 
files (Additional file 3 and 4). We found that two drugs 
with more common neighbors generally have higher 
rankings. Using the set of 74 effective combinations as 
the gold standard while the 1107 random ones as nega- 
tive set (Additional file 3), we evaluated our approach in 
identifying new drug combinations. Figure 6 shows the 
ROC curves [31] obtained by different methods, where 
the drug pairs ranked above a given threshold were pre- 
dicted as effective drug combinations (positives), while 
the rest were regarded as negatives. We then calculated 
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the area under the ROC curves (AUC) [32] for these dif- 
ferent DCPred models. As a result, DCPred2 achieved 
an AUC score of 0.88 (the green curve in Figure 6), in 
comparison with the AUC of 0.75 for the TS-based 
method (DCPredl) (the red curve in Figure 6). To com- 
prehensively evaluate the predictive power of the three 
models, we also calculated three other performance 
indexes: Sensitivity, Specificity and Accuracy at varying 
thresholds for DCPredl, DCPred2 and DCPredS models 
(See the Additional file 5, 6 and 7, respectively). 
Of the top 35 ranked drug combinations inferred by our 
models, 63% of them (22/35) are known effective drug 
combinations according to DCDB, and 37% (13/35) do 
not have any annotations in DCDB (Table 3). Neverthe- 
less, 4 out of these 13 drug combinations were reported 
in the literature, i.e. the 13* 22*, 34* and 35* in the 
ranking list (Table 3). The 34* ranked one is a combi- 
nation of irinotecan and capecitabine, known as XELIRI, 
and used to treat metastatic colorectal cancer [33]. 
Alfonso et al. demonstrated that XELIRI is effective and 
safe as the first-line chemotherapy for treating advanced 
colorectal cancer or metastatic colorectal cancer [34]. 
The 13* ranked one is the combination of docetaxel 
and gemcitabine, the former interferes with the normal 
function of microtubule growth and destroys the cell's 
ability to use its cytoskeleton in a flexible manner, while 
the latter inhibits thymidylate synthetase leading to 



inhibition of DNA synthesis and cell death [35,36]. Levy 
et al. found that gemcitabine-docetaxel combination has 
a favorable risk-benefit profile and is an important new 
treatment option for women with metastatic breast can- 
cer [37]. The 22* one is the combination of sorafenib 
and bevacizumab. The former interacts with multiple 
intracellular (CRAF, BRAF and mutant BRAF) and cell 
surface kinases (KIT, FLT-3, VEGFR-2, VEGFR-3, and 
PDGFR-E) to reduce blood flow to the tumor for the 
treatment of patients with advanced renal cell carcinoma 
[38], while the latter binds VEGF and prevents the inter- 
action of VEGF to its receptors (Flt-1 and KDR) on the 
surface of endothelial cells [39]. Consequently, this pre- 
vents blood vessel proliferation and tumor metastasis 
for metastatic colorectal cancer and HER2-negative 
metastatic breast cancer. Azad et al. demonstrated that 
complementary inhibition of VEGF signaling has syner- 
gistic therapeutic effects, and this combination therapy 
has promising clinical activity over ovarian cancer [40]. 
The 35* one is the combination of thalidomide and 
lenalidomide. Thalidomide has been successfully intro- 
duced to treat multiple myeloma and its analogue, lena- 
lidomide, is also effective in relapsed refractory 
myeloma [41]. The Thalidomide-lenalidomide combina- 
tion can induce tumour cell apoptosis directly or indir- 
ectly by altering bone marrow microenvironment, and 
can be used in combination to treat multiple myeloma 
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Table 3 The novel predictions of DCPred2 


Rank 


Drug components 


P-value 


Common 
targets 


Reported effective combinations? 


10 


DB00275; DB00177 
Olmesartan; Valsartan 


4.35E-05 


AGTR1 


No 


1 1 


DB00275; DB00966 
Olmesartan; Telmisartan 


4,35E-05 


AGTR1 


No 


12 


DB001 77; DB00966 
Valsartan; Telmisartan 


4.35E-05 


AGTR1 


No 


13 


D B0 1248; DB00441 
Docetaxel; Gemcitabine 


4,83 E-05 


None 


Yes 


14 


DB00526; DBG 1248 

Oxaliplatin; Docetaxel 


4.83 E-05 


None 


No 


22 


DB00398; DBOOl 12 
Sorafenib; Bevacizumab 


0.000130406 


None 


Yes 


23 


DB00741; DB00850 
Hydrocortisone; Prednisolone 


G.00G13G4G6 


None 


No 


24 


DBG 1132; DBGG412 
Pioglitazone; Rosiglitazone 


G.OOG 130406 


PPARG 


No 


Z.D 


Lenalidomide; Cytarabine 


n ooni ^n4ftft 




No 


26 


DB00398; DBG0002 
Sorafenib; Cetuximab 


0.000130406 


None 


No 


27 


DBG0564; DBG0776 

Carbamazepine; Oxcarbazepine 


0.000130406 


SCN5A 


No 


34 


DB00762; DB01101 
Irinotecan; Capecitabine 


0.00026081 3 


None 


Yes 


35 


DBGG480; DBG1G41 
Lenalidomide; Thalidomide 


0.000260813 


PTGS2 


Yes 



Novel predictions (not reported in DCDB) in tiie top 35 ranked drug combinations, wliere only drugs hiaving at least 2 neighbors were included, resulting in 74 
positive drug combinations and 1107 random ones. 



[42]. Both drugs bind to a common target PTGS2, 
which may play a role as a major mediator of inflamma- 
tion and/or a role for prostanoid signaling in activity- 
dependent plasticity [43]. Thalidomide and lenalidomide 
have been shown to significantly improve the overall 
and disease-free survival. Combination of these two 
drugs has recently emerged as a promising combination 
strategy to improve the patient outcome and drug toxi- 
city, especially in the treatment of multiple myeloma 
(MM) and hematologic cancers [44]. 

If we only considered the combinations whose drug 
components have at least 3 neighbors, termed as 
DCPred3 (the blue curve in Figure 6), we predicted 40 
combinations and 379 negative ones (Additional file 4). 
DCPredS achieves an AUC score of 0.92. Compared 
with the aforementioned two models DCPredl and 
DCPred2, based on the information of at least 3 neighor 
drugs, DCPredS leads to the overall best performance. 
In this work, we considered the results by DCPred2 as 
the final results because only few drugs have more than 
two neighbors in the drug cocktail network. We hope 
that the DCPred models developed in this study can be 
used to facilitate the in silico identification of effective 
drug combinations and speed up the future discovery 
process. 



Conclusions 

Drug combination is a promising strategy for combating 
complex disease, but our complete understanding of the 
underlying mechanisms of drug combination is largely 
lacking at present. It is therefore imperative to develop 
efficient computational methods to infer effective drug 
combinations in order to reduce the labor-intensive, time 
consuming trial-and-error experiments. In this article, we 
extracted all the known effective drug combinations from 
DCDB and constructed a drug cocktail network, which 
includes 215 drugs and 239 effective drug combinations. 
Based on this cocktail network, we observed that the star 
drugs tend to have therapeutic similarity with their drug 
neighbors, and two drugs having similar therapy and 
sharing neighbors tend to be employed in drug combina- 
tion. Our analysis also revealed that: 1) hub drugs usually 
have similar and even the same therapeutic effects as 
their neighbors; 2) target proteins of the hub drugs are 
often membrane or membrane-associated proteins; 3) the 
components in effective drug combinations usually have 
more similar therapeutic effects, making the drug cocktail 
network significantly different from the random combi- 
nation networks. 

From the above observations, we consequently devel- 
oped a new statistical approach to infer and rank 
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possible effective drug combinations by taking into 
account drugs with at least two or three drug neighbors. 
As a result, our DCPred2 and DCPredS models achieved 
the AUC scores of 0.88 and 0.92, respectively, demon- 
strating a good performance. We further applied these 
models to rank all the possible drug combinations and 
found that the top ranked combinations are more likely 
to be effective combinations, according to the cross- 
reference to the literature or the similarity of their ATC 
codes. In particular, four combinations in the top 35 
rankings have been verified as effective combinations by 
the literature search. We also show that there is a better 
chance for another 3 combinations to be effective com- 
binations in terms of the pharmacological similarity. 
Our results in this study provide useful insights into the 
underlying mechanisms of effective drug combinations 
and hence important clues for efficiently reducing the 
search space of possible combinations within the 
approved drugs. Our approach may be further useful for 
developing more accurate models. The DCPred models 
are anticipated to be applied to screen more effective 
drug combinations with clinical importance. 

Furthermore, the concentration of each drug in a 
combination is a crucial factor in the study of drug 
combination. However, it is currently difficult to utilize 
the dosage information of drugs without the knowledge 
of their quantitative dose-response profiles (e.g. drug 
induced gene/protein expression data) under different 
drug concentrations, due to the limited availability of 
such data. We will investigate drug combinations from 
this perspective in the future, when more data regarding 
drug concentrations become available. 

Methods 

Data sources 

The annotations of drug combinations were retrieved 
from a newly released Drug Combination Database 
(DCDB) [21]. This is a major resource for collecting 
effective drug combinations from the literature. The tar- 
get protein information, the Anatomical Therapeutic 
Chemical (ATC) code annotation of the drugs and pro- 
tein subcellular localizations, were extracted from Drug- 
Bank [22]. Drug combinations that do not have ATC 
codes for the corresponding drug components and com- 
binations with none or unclear efficacy were discarded. 
Finally, 194 effective drug combinations were obtained, 
including 76 approved combinations, 64 clinical combi- 
nations and 54 preclinical combinations. We then split 
the combinations with more than two drug components 
into combination pairs, resulting in 239 drug combina- 
tion pairs. These drug combinations were used to con- 
struct a drug cocktail network (Figure 1), where the 
nodes represent drugs and the edges represent combina- 
tions, respectively. In the drug cocktail network, the size 



of each node denotes its degree and the width of each 
edge denotes the therapeutic similarity (TS) between the 
two drugs linked by the edge. The gray edge means that 
there is no therapeutic similarity between the two drugs. 

Human protein-protein interactions (PPIs) with high 
confidence from STRING [30] were used to annotate 
this drug cocktail network, which includes 169,603 
interactions between 11,289 proteins after removing 
pairs with low scores ( < 700). 

Drug therapeutic similarity 

The Anatomical Therapeutic Chemical (ATC) Classifica- 
tion System, which includes 5 different hierarchical 
levels, was used to classify drugs into different groups 
according to the organ they acted on and the therapeu- 
tic chemical characteristics. The k-th level drug thera- 
peutic similarity (S^t) between two drugs is defined using 
the ATC codes of these two drugs: 



Sk{di,d2) 



ATCu{di]nArCkid2) 
ATCk{di)UATCk{d2) 



(2) 



where ATCi^d) denotes all the ATC codes at the k-th 
level of drug d. Note that a drug has five levels of ATC 
codes. A score, TS, is used to define the therapeutic 
similarity between two drugs: 



TS[di, dj) 



J2Skidi,d2) 

k=l 



(3) 



where n ranges from 1 to 5. In this study, « = 3 is 
adopted considering that only a few drugs have the 
same ATC codes at the 5* level. 

Drug combination prediction 

We assume that two drugs are more likely to be com- 
bined if they share a large number of common drugs in 
the drug cocktail network. For example, if two drugs di 
and (^2 with respective «i and K2 partners have m in 
common in the drug cocktail network, there will be 
three groups in the neighborhood of the two drugs, i.e. 
(1) m drugs that are the neighbors of both drug di and 
(^2; (2) «i - m partners that are the neighbors of drug di 
only; and (3) - m partners are the neighbors of drug 
d2 only [45]. Suppose that there are totally N drugs in 
the drug combination network, then a /?-value between 
d-^ and d2 can be calculated using the following equa- 
tion: 



P{m,ni,n2,N) 



N-m\ /N 
n\ — m ) \ n2 — 



m ) 



(4) 
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If two drugs share more common drugs compared with 
all of their neighbors, the j!?-value computed by equation 
(4) will be closer to 0, which means they are more likely to 
be combined. We use the equation (4) to compute the p- 
values for all possible combinations and then rank the 
values in ascending order. As drug pairs with lower p- 
values are more likely to be combined, the prediction of 
effective drug combinations can be made given a certain 
p-va[ue threshold. We term this framework that explores 
the drug cocktail network and predicts possible drug com- 
bination as DCPred (Drug Combination Predictor) and 
assess its performance for inferring effective drug combi- 
nations based on the curated drug combinations dataset. 
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